DSC 140B

Problem #166

Tags: multiple outputs, lecture-16, softmax

A neural network with 3 output nodes uses the softmax activation function. The pre-activation values (logits) at the output layer are $\vec z = (0, 2, 0)$. Compute the softmax output $\vec h = (h_1, h_2, h_3)$.

Leave your answer in terms of $e$.

Solution

\[\vec h = \left(\frac{1}{2 + e^2},\;\frac{e^2}{2 + e^2},\;\frac{1}{2 + e^2}\right)\]

By the softmax formula, $h_k = \frac{e^{z_k}}{\sum_{j=1}^{3} e^{z_j}}$. The denominator is:

\[ e^{0} + e^{2} + e^{0} = 1 + e^2 + 1 = 2 + e^2 \]

Therefore:

$$\begin{align*} h_1 &= \frac{e^0}{2 + e^2} = \frac{1}{2 + e^2}\\ h_2 &= \frac{e^2}{2 + e^2}\\ h_3 &= \frac{e^0}{2 + e^2} = \frac{1}{2 + e^2}\end{align*}$$

Problem #167

Tags: multiple outputs, lecture-16, softmax

A neural network with 4 output nodes uses the softmax activation function. The pre-activation values (logits) at the output layer are $\vec z = (1, 3, 1, 3)$. Compute the softmax output $\vec h = (h_1, h_2, h_3, h_4)$.

Leave your answer in terms of $e$.

Solution

\[\vec h = \left(\frac{1}{2(1 + e^2)},\;\frac{e^2}{2(1 + e^2)},\;\frac{1}{2(1 + e^2)},\;\frac{e^2}{2(1 + e^2)}\right)\]

By the softmax formula, $h_k = \frac{e^{z_k}}{\sum_{j=1}^{4} e^{z_j}}$. The denominator is:

\[ e^{1} + e^{3} + e^{1} + e^{3} = 2e + 2e^3 = 2e(1 + e^2) \]

Therefore:

$$\begin{align*} h_1 = h_3 &= \frac{e}{2e(1 + e^2)} = \frac{1}{2(1 + e^2)}\\ h_2 = h_4 &= \frac{e^3}{2e(1 + e^2)} = \frac{e^2}{2(1 + e^2)}\end{align*}$$

Problem #168

Tags: multiple outputs, lecture-16, softmax

A neural network with 3 output nodes uses the softmax activation function. The pre-activation values (logits) at the output layer are $\vec z = (1, 2, 3)$. Compute the softmax output $\vec h = (h_1, h_2, h_3)$.

Leave your answer in terms of $e$.

Solution

\[\vec h = \left(\frac{1}{1 + e + e^2},\;\frac{e}{1 + e + e^2},\;\frac{e^2}{1 + e + e^2}\right)\]

By the softmax formula, $h_k = \frac{e^{z_k}}{\sum_{j=1}^{3} e^{z_j}}$. The denominator is:

\[ e^{1} + e^{2} + e^{3} = e(1 + e + e^2) \]

Therefore:

$$\begin{align*} h_1 &= \frac{e}{e(1 + e + e^2)} = \frac{1}{1 + e + e^2}\\ h_2 &= \frac{e^2}{e(1 + e + e^2)} = \frac{e}{1 + e + e^2}\\ h_3 &= \frac{e^3}{e(1 + e + e^2)} = \frac{e^2}{1 + e + e^2}\end{align*}$$

Problem #169

Tags: binary cross-entropy, lecture-16, multiple outputs

A multi-label classifier has 3 output nodes with sigmoid activations. The true labels are $\vec y = (1, 0, 1)$ and the predicted probabilities are $\vec h = (0.9, 0.2, 0.8)$.

Compute the binary cross-entropy loss. Leave your answer in terms of $\log$.

Solution

$-\log(0.9) - \log(0.8) - \log(0.8) = -\log(0.9) - 2\log(0.8)$.

By the binary cross-entropy formula:

\[\ell(\vec h, \vec y) = -\sum_{k=1}^{3}\begin{cases} \log h_k, & \text{if } y_k = 1 \\ \log(1 - h_k), & \text{if } y_k = 0 \end{cases}\]

Evaluating each term:

$$\begin{align*} k = 1&: \quad y_1 = 1, \text{ so } -\log(0.9) \\ k = 2&: \quad y_2 = 0, \text{ so } -\log(1 - 0.2) = -\log(0.8) \\ k = 3&: \quad y_3 = 1, \text{ so } -\log(0.8) \end{align*}$$

The total is $-\log(0.9) - 2\log(0.8)$.

Problem #170

Tags: binary cross-entropy, lecture-16, multiple outputs

A multi-label classifier has 4 output nodes with sigmoid activations. The true labels are $\vec y = (0, 1, 0, 1)$ and the predicted probabilities are $\vec h = (0.3, 0.7, 0.1, 0.9)$.

Compute the binary cross-entropy loss. Leave your answer in terms of $\log$.

Solution

$-2\log(0.7) - 2\log(0.9)$.

By the binary cross-entropy formula:

\[\ell(\vec h, \vec y) = -\sum_{k=1}^{4}\begin{cases} \log h_k, & \text{if } y_k = 1 \\ \log(1 - h_k), & \text{if } y_k = 0 \end{cases}\]

Evaluating each term:

$$\begin{align*} k = 1&: \quad y_1 = 0, \text{ so } -\log(1 - 0.3) = -\log(0.7) \\ k = 2&: \quad y_2 = 1, \text{ so } -\log(0.7) \\ k = 3&: \quad y_3 = 0, \text{ so } -\log(1 - 0.1) = -\log(0.9) \\ k = 4&: \quad y_4 = 1, \text{ so } -\log(0.9) \end{align*}$$

The total is $-2\log(0.7) - 2\log(0.9)$.

Problem #171

Tags: binary cross-entropy, lecture-16, multiple outputs

A multi-label classifier has 3 output nodes with sigmoid activations. The true labels are $\vec y = (1, 1, 0)$ and the predicted probabilities are $\vec h = (0.8, 0.6, 0.4)$.

Compute the binary cross-entropy loss. Leave your answer in terms of $\log$.

Solution

$-\log(0.8) - 2\log(0.6)$.

By the binary cross-entropy formula:

\[\ell(\vec h, \vec y) = -\sum_{k=1}^{3}\begin{cases} \log h_k, & \text{if } y_k = 1 \\ \log(1 - h_k), & \text{if } y_k = 0 \end{cases}\]

Evaluating each term:

$$\begin{align*} k = 1&: \quad y_1 = 1, \text{ so } -\log(0.8) \\ k = 2&: \quad y_2 = 1, \text{ so } -\log(0.6) \\ k = 3&: \quad y_3 = 0, \text{ so } -\log(1 - 0.4) = -\log(0.6) \end{align*}$$

The total is $-\log(0.8) - 2\log(0.6)$.

Problem #172

Tags: multiple outputs, lecture-16, categorical cross-entropy

A multi-class classifier has 4 output nodes with softmax activation. The true label is $\vec y = (0, 0, 1, 0)$ and the softmax outputs are $\vec h = (0.1, 0.2, 0.6, 0.1)$.

Compute the categorical cross-entropy loss. Leave your answer in terms of $\log$.

Solution

$-\log(0.6)$.

By the categorical cross-entropy formula:

\[\ell(\vec h, \vec y) = -\sum_{k=1}^{4}\begin{cases} \log h_k, & \text{if } y_k = 1 \\ 0, & \text{if } y_k = 0 \end{cases}\]

Only $y_3 = 1$ contributes, so the loss is $-\log(h_3) = -\log(0.6)$.

Problem #173

Tags: multiple outputs, lecture-16, categorical cross-entropy

A multi-class classifier has 3 output nodes with softmax activation. The true label is $\vec y = (0, 1, 0)$ and the softmax outputs are $\vec h = (0.3, 0.5, 0.2)$.

Compute the categorical cross-entropy loss. Leave your answer in terms of $\log$.

Solution

$-\log(0.5) = \log 2$.

By the categorical cross-entropy formula:

\[\ell(\vec h, \vec y) = -\sum_{k=1}^{3}\begin{cases} \log h_k, & \text{if } y_k = 1 \\ 0, & \text{if } y_k = 0 \end{cases}\]

Only $y_2 = 1$ contributes, so the loss is $-\log(h_2) = -\log(0.5) = \log 2$.

Problem #174

Tags: multiple outputs, lecture-16, categorical cross-entropy

A multi-class classifier has 4 output nodes with softmax activation. The true label is $\vec y = (1, 0, 0, 0)$ and the softmax outputs are $\vec h = (0.4, 0.3, 0.2, 0.1)$.

Compute the categorical cross-entropy loss. Leave your answer in terms of $\log$.

Solution

$-\log(0.4)$.

By the categorical cross-entropy formula:

\[\ell(\vec h, \vec y) = -\sum_{k=1}^{4}\begin{cases} \log h_k, & \text{if } y_k = 1 \\ 0, & \text{if } y_k = 0 \end{cases}\]

Only $y_1 = 1$ contributes, so the loss is $-\log(h_1) = -\log(0.4)$.

Problem #175

Tags: multiple outputs, sigmoid, lecture-16, softmax

A neural network classifies images into 5 categories.

Part 1)

Suppose the categories are mutually exclusive (each image belongs to exactly one category). Which activation function should be used at the output layer: sigmoid or softmax?

Solution

Softmax.

Since the categories are mutually exclusive, we want the output probabilities to represent a single probability distribution over the 5 classes. Softmax enforces this by ensuring the outputs sum to 1.

Part 2)

True or False: with the activation from part (a), the 5 outputs must sum to 1.

True False

Solution

True.

The softmax function produces outputs that sum to 1 by definition:

\[\sum_{k=1}^{K} h_k = \sum_{k=1}^{K}\frac{e^{z_k}}{\sum_{j} e^{z_j}} = 1 \]

Part 3)

Now suppose an image can belong to multiple categories simultaneously. Which activation function should be used at the output layer: sigmoid or softmax?

Solution

Sigmoid.

Since the categories are not mutually exclusive, each output node independently predicts the probability that the image belongs to that category. Sigmoid is applied independently to each output.

Part 4)

True or False: with the activation from part (c), the 5 outputs must sum to 1.

True False

Solution

False.

Sigmoid is applied independently to each output node, so there is no constraint that the outputs sum to 1. For example, if the image contains both a cat and a dog, the network might output high probabilities for both.

Problem #176

Tags: multiple outputs, lecture-16, regression

A neural network with 3 output nodes is trained to predict temperature, humidity, and wind speed simultaneously. The network uses the multi-target regression loss:

\[\ell(\vec h, \vec y) = \|\vec h - \vec y\|^2 = \sum_{k=1}^{3}(h_k - y_k)^2 \]

For a particular data point, the network's predictions are $\vec h = (5, 3, 7)$ and the true values are $\vec y = (3, 4, 5)$. Compute the loss.

Solution

$9$.

$$\begin{align*}\ell(\vec h, \vec y) &= (5 - 3)^2 + (3 - 4)^2 + (7 - 5)^2 \\&= 4 + 1 + 4 \\&= 9 \end{align*}$$

Problems tagged with "multiple outputs"

Problem #166

Problem #167

Problem #168

Problem #169

Problem #170

Problem #171

Problem #172

Problem #173

Problem #174

Problem #175

Part 1)

Part 2)

Part 3)

Part 4)

Problem #176